If you already know UNIX-style regular expressions, here are the main differences between the regular expressions used in Nuance OmniPage Capture SDK and in most UNIX implementations:
- In ImageGear Recognition, anchors ('^', '$') are implicit. Since we describe one field at a time, in the majority of the cases we do not need the flexibility that the pattern can match anywhere within a long text. In other words, Capture SDK always assumes a '^' at the beginning and a '$' at the end of your regular expressions. It does no harm to put them there explicitly, however. If you do want your pattern to match anywhere within a longer text, use ".*" at the beginning and end of your regular expression.
- In ImageGear Recognition, regular expressions are composed of UNICODE characters rather than ASCII, ANSI, or other 8-bit coding. This way it is possible to use national characters (like 'A', 'u', 'N', etc.) freely. It is also possible to use special character classes ("\l", "\u") to express all lower or uppercase characters as defined by the current language setting within ImageGear Recognition.
- There are no back-references; so you cannot use something like "(.*)\1".